Characterization and phylogenetic analysis of the complete mitochondrial genome sequence of Photinia serratifolia

Plant mitochondrial genomes (mitogenomes) are a valuable source of genetic information for a better understanding of phylogenetic relationships. However, no mitogenome of any species in the genus of Photinia has been reported. In this study, using NGS sequencing, we reported the mitogenome assembly and annotation of Photinia serratifolia, which is 473,579 bp in length, contains 38 protein-coding genes, 23 tRNAs, and 6 rRNAs, with 61 genes have no introns. The rps2 and rps11 genes are missing in the P. serratifolia mitogenome. Although there are more editing sites (488) in the P. serratifolia mitogenome than in most angiosperms, fewer editing types were found in the P. serratifolia mitogenome, showing a clear bias in RNA-editing. Phylogenetic analysis based on the mitogenomes of P. serratifolia and 8 other taxa of the Rosaceae family reflected the exact evolutionary and taxonomic status of P. serratifolia. However, Ka/Ks analysis revealed that 72.69% of the protein-coding genes in the P. serratifolia mitogenome had undergone negative selections, reflecting the importance of those genes in the P. serratifolia mitogenome. Collectively, these results will provide valuable information for the evolution of P. serratifolia and provide insight into the evolutionary relationships within Photinia and the Rosaceae family.

significant variations in size and structure organization 20 . For example, the genome size can vary from 66 kb of Viscum scurruloideum 21 to 11.3 Mb of Silene conia 22 . The number of protein-coding genes varies from 33 of Arabidopsis thaliana 23 to 74 of Vitis vinifera 24 . The number of tRNA genes varies from 3 of Rosa chinensis 25 to 31 of V. vinifera 24 . In addition, the plant mitogenome has numerous repetitive sequences and multiple RNA editing modifications 26 . In contrast to the conserved structure of plant chloroplast genomes, the variations in mitogenomes are not only between plant species but also can be within the same species 12,17,22 . For these reasons, mitogenomes have been used as a valuable source of genetic information and for the investigation of essential cellular processes in many phylogenetic studies. However, the characteristics of plant mitogenomes (bigger size, more structural complexity, and low conservation across species) make plant mitogenome assembly difficult 13,14 . Fortunately, advancements in long-read sequencing, such as PacBio and Oxford Nanopore, have made organelle genome sequencing easier and faster.
Recently, the complete chloroplast genome sequences of Photinia × fraseri, Photinia davidsoniae, and Photinia glabra have been sequenced and published 1,27,28 . However, at present, no mitogenome of any species in Photinia has been reported. Therefore, in this study, we constructed the complete mitogenome of P. serratifolia based on Oxford Nanopore and Illumina data, performed a phylogenetic analysis, and compared the complete mitogenomes of P. serratifolia and related family. Our results will help better understand the features of the P. serratifolia mitogenome and lay the foundation for identifying further evolutionary relationships within Rosaceae.

Results and discussion
Sequencing and genome structure of the complete mitogenome of P. serratifolia. The total DNA of P. Serratifolia was sequenced, and the raw data had been prepared for assembly, resulting in 115.88 G Nanopore PromethION sequencing data with an average read length of 23,654 bp 706 bp) and 34.3 G Illumina sequencing data (Supplementary Table S1). We then assembled the complete mitogenome of P. serratifolia in a circular contig of 473,579 bp ( Fig. 1), which has been deposited in the NCBI Genome Database (GenBank accession number: MZ153172). The mitogenomes of 19 species were selected for analysis in this study (Supplementary Table S2). It is well known that the plant mitogenome greatly varies in size, from 66 kb in V. scurruloideum 21 to 11.3 Mb in S. conica 22  Gene contents of the mitogenome of P. serratifolia. Although the genome size of plant mitochondrial greatly varied, the number of mitochondrial genes is relatively conserved in the land plant lineage, with 60-80 known genes found in different terrestrial plant species 29 . In the P. serratifolia mitogenome, 67 genes (38 protein-coding genes, 23 tRNA genes, and 6 rRNA genes) were annotated (Supplementary Table S2). The functional categorization and physical locations of the annotated genes were shown in Fig. 1. The 38 encoded proteins (nad6 and atp1 have two copies) could be divided into 11 classes: ATP synthase (6), cytochrome C biogenesis (4), ubiquinol cytochrome c reductase (1), cytochrome C oxidase (3), maturases (1), transport membrane protein (1), NADH dehydrogenase (10), ribosomal proteins (large subunit (LSU); 3), ribosomal proteins (small subunit (SSU); 6), succinate dehydrogenase (2), and ribonuclease (1) (Supplementary Table S3).
Although comparative analyses of mitogenomes have shown that the sequences of protein-coding genes are highly conserved in plants, variations among plant mitogenomes characterized so far have mainly been reported in the ribosomal proteins 30,31 . In addition, the gene components cytochrome c biogenesis gene has also been reported to be different among the plant mitogenomes 32 . Interestingly, consistent with previous mitogenome studies of Rosaceae 33 , most rps genes (rps2, rps7, rps10, rps11, rps19) were missing in the mitogenome of P. serratifolia (Fig. 2). The functions of missing ribosomal genes may be replaced by nuclear genes, which may be related to the rapid radiation evolution of Rosaceae plants 34 . Although there was no significant variation of the composition of cytochrome C synthase gene among other species of the Rosaceae family in our study, the length of ccmFc, ccmFn, cob, cox1, cox2, and cox3, in the mitogenome of P. serratifolia, R. bibas, and M. hupehensis, were 797-2271 bp, which was significantly higher than that of other species (212-587 bp) of the family.
Other than ribosomal proteins, the major variations characterized among plant mitogenomes, even in the same genus, are in the tRNA gene contents 30 . The P. serratifolia mitochondria had 23 tRNAs (Supplementary Table S3). The average length of these tRNAs was 71-87 bp, with a total length of 1725 bp (Supplementary Table S3). The number of tRNAs in the P. serratifolia mitogenome was more than that in other species of the Rosaceae family, such as R. bibas (22), M. domestica (20), P. avium (16), and S. torminalis (18) (Supplementary  Table S2). This may be because some tRNAs in the P. serratifolia mitogenome have multiple copies. For example, trnfM-CAT and trnF-GAA have two copies. The function of the missing mitochondrial tRNAs may be replaced by chloroplast-derived tRNAs in species with less mitochondrial tRNAs 34 . Moreover, consistent with the previous report 35 , we found that protein-coding genes of the P. serratifolia were not increased along with the increase of tRNAs. Repeat sequences analysis. SSRs, or microsatellites, are DNA stretches consisting of short, tandem units of sequence repetitions of 1-6 base pairs in length 36 . In the current study, we identified 59 SSRs in the P. serratifolia mitogenome. The proportions of different repeat units were shown in Fig. 3. Consistent with all observed species, mononucleotide repeats were the most abundant SSR type in P. serratifolia, constituting 79.67% (47 repeats) of all identified SSRs. In addition, there were 7 SSRs (11.86%) and 5 SSRs (8.47%) in di-, trinucleotide repeats, respectively. However, there were no tetra-, penta-, and hexa-repeats identified in P. serratifolia mitogenome. The mononucleotide repeats of A/T motifs (a total of 41 repeats) were the most recurrent motifs, representing 69.49% of all identified SSRs (Supplementary Table S4). According to the trend that the distribution pattern of microsatellites is consistent with their phylogenetic status in plants 37 , the SSR composition of P. serratifolia was similar to its most closely related species, such as R. bibas and P. betulifolia (Fig. 3).  Table S5). The repetitive sequence in the P. serratifolia mitogenome was 51.05 kb, accounting for 10.78% of the mitogenome. The proportion of repeats is higher than that in Garcinia mangostana (5.8%) 38 and Prunus salicina (7.22%) 39 , but lower than that in Nicotiana tabacum (13%) 40 and Daucus carota (16%) 41 . The Figure 2. Distribution of protein-coding genes in plant mitogenomes. Yellow, green, and purple boxes indicate that one, two, and three copies exist in the plant mitogenome, respectively. White boxes indicate that the gene is missing in the plant mitogenome. The circles, squares, and triangles represent dicots, monocots, and gymnosperms, respectively. Besides, the red-colored plant names are species from the Rosaceae family.    Table S5). The distribution of repeat is consistent with many plant mitogenomes that have one or more pairs of large repeats 38,42,43 . Some reports showed that larger and medium-sized repeats can act as sites for inter-or intramolecular recombination, leading to multiple alternative arrangements or isoforms 42,43 . Although the frequency of recombination events was low, all these sequencing reads were aligned to the P. serratifolia mitogenome for the detection of potential alternative isoforms. As a benefit of Nanopore PromethION sequencing, these ultra-long reads of P. serratifolia, with an average read length of 23,654 bp, is longer than these identified repeats. Therefore, the long reads can cover identified repeats with high probability. As shown in Fig. 5, the sequencing reads coverage of these repeats is similar to those of other non-repetitive sequences, which implies no branching nodes in each repeat. Therefore, P. serratifolia mitochondrial master genome assembly can be represented in the circular form, as previously reported in plant mitogenomes 20,38,44 . However, there is a total length of 88,247 bp between the two copies of the long repeats (Fig. 4a), which may give rise to an alternative configuration of mitogenomes via inversions of these long repeats in master conformation (Fig. 4b,c).  45 . We predicted 488 RNA-editing sites within the 33 protein-coding genes (Fig. 6) in the P. serratifolia mitogenome, which was similar to those in A. thaliana (441 sites) 15 , Eucalyptus grandis (470 sites) 46 , and Citrullus lanatus (463 sites) 47 and less than those in gymnosperms that have larger mitogenomes, such as Taxus cuspidata (974 sites), Pinus taeda (1179 sites), Cycas revoluta (1206 sites), and G. biloba (1306 sites) 48 . However, whether the number of RNAediting sites is positively correlated with the size of the mitogenome requires further research. The selection of mitochondrial RNA-editing sites in P. serratifolia shows a high degree of compositional bias. As shown in Fig. 6, all RNA-editing sites are the C-T editing type, which is consistent with the fact that C-T is the most common editing type found in plant mitogenomes 49,50 . Inconsistent with previous studies 50 , more than half (313 sites, 64.14%) of the mitochondrial RNA editing occurred at the second codon position in P. serratifolia www.nature.com/scientificreports/ (Fig. 6), followed by that at the first codon position (161 sites; 32.99%) (Fig. 6). However, no editing site was found at the third position of triplet codons, consistent with the fact that RNA-editing sites at this position were rare in plant mitogenomes 48,49 . Although the P. serratifolia mitogenome has more RNA-editing sites, and the vast majority of RNA editing occurs at the first or second position of codons, there were only 30 codon transfer types, corresponding to 14 amino acid transfer types, suggesting a consolidated biological function. The types of transfer are comparable to those of most gymnosperms (30-40 codons; around 20 amino acids) 48,50 but less than those of monocotyledonous and dicotyledonous plants (50-60 codons; around 30 amino acids) 46,47,49 . Among the 30 codon transfer types, TCA = > TTA was the most common type, with 68 sites. A leucine tendency after RNA editing, supported by the fact that 44.88% (219 sites) of the edits were converted to leucine, was found in the amino acids of predicted editing codons. After RNA editing, 32.0% of the amino acids remained hydrophobic. However, 46.3% of the amino acids were predicted to change from hydrophilic to hydrophobic, while 8.6% were predicted to change from hydrophobic to hydrophilic. Overall, our study suggests that the P. serratifolia mitogenome has more RNAediting sites but fewer editing types.
It has been well established that RNA editing is an epitranscriptomic mechanism that modifies primary RNAs, and is widespread in plants organelles 51 , Fig. 7 shows the total number of editing sites of all of the 33 proteincoding genes. Although the pattern changes of RNA editing extent varies between different plant species 52 , similar to most angiosperms 50 , ribosomal proteins (except rps4) and ATPase subunits (except atp6) had a relatively small number of RNA-editing-derived substitutions (2-11 sites), while the transcripts of NADH dehydrogenase subunits and cytochrome c biogenesis genes were significantly edited (13-39 sites; Fig. 7) in the P. serratifolia mitogenome. Consistent with the previous report, such as Phaseolus vulgaris 26 and Suaeda glauca 53 , nad4 (36 sites), ccmFn (39 sites), and ccmB (31 sites) had the highest total number of RNA-editing sites predicted in the P. serratifolia mitogenome (Fig. 7). This supports the essential role of editing sites in the proper functioning of mitochondrially encoded proteins. Supplementary Table S3, in the P. serratifolia mitogenome, ATG was used as the starting codon by almost all the protein-coding genes, while mttB starts with TTG, rpl16 and rps4 start with GTA as the start codon. Three types of stop codons, TAA, TGA, and TAG, were found in the P. serratifolia mitogenome which had utilization rates of 44.7%, 31.6%, and 23.7%, respectively (Supplementary Table S3). The relative synonymous codon usage (RSCU) value for P. serratifolia for the third codon position is shown in Fig. 8. Consistent with most of the currently studied mitogenomes 10,53,54 , the use of both two-and four-fold degenerate codons was biased toward the use of codons abundant in A or T. In P. serratifolia, 14,333 amino acids were encoded. The most frequently used amino acids were Leu (7.1%), Arg (6.3%), and Ser (6.1%), and the least common amino acids were Trp (1.4%) and Met (1%) (Fig. 8).

Codon usage and Ka/Ks analysis. As shown in
In genetics, the Ka and Ks substitution ratio (Ka/Ks) is useful for inferring the direction and magnitude of natural selection across diverged species 55 . A Ka/Ks ratio < 1 implies negative selection, while a ratio of > 1 implies positive selection (driving change) and a ratio of exactly 1 indicates neutral selection. To evaluate selective pressures during the evolutionary dynamics of protein-coding genes among closely related species, the Ka/ Ks ratio of 17 single copy PCGs among P. serratifolia and 7 Rosaceae species mitogenomes was calculated. As shown in Fig. 9, there was no substitution in most mitochondrial genes, such as rpl5, rps13, rps14, nad3, nad4L, atp9, ccmB, and cox1, among P. serratifolia and other seven species in Rosaceae. More frequency changes were found in atp genes among species.
In 21 cases (Fig. 9), Ka/Ks values of P. serratifolia gene-specific substitution rates were higher than 1. This result suggests a positive selection during the evolution of P. serratifolia as compared with 7 other species 55,56 Among these cases, the Ka/Ks values of the nad gene-specific substitution rates of P. serratifolia were higher, with Ka/Ks values of 7 nad7 genes and 4 nad3 > 1, suggesting large variation and positive selection during nad gene evolution among Rosaceae 55 . However, most genes had undergone negative selection pressures during evolution, supported by the fact that the Ka/Ks values of 86 proteins-coding genes, accounting for 72.69% of the proteinscoding genes, were less than 1 compared to the other plant species. Taken together, these results suggest that mitochondrial genes are highly conserved during the evolutionary process in Rosaceae plants.  p 1 a t p 4 a t p 6 a t p 8 a t p 9 c  (Fig. 10) were analyzed using the concatenated dataset by 17 PCGs through ML phylogenetic analysis. The abbreviations and accession numbers of the mitogenomes investigated in this study are listed in Supplementary Table S2. As shown in Fig. 10, as outgroups, the G. biloba, which belongs to gymnosperm, was distinct from the other angiosperms. Moreover, the taxa of the 7 Rosaceae species were well clustered. Among the Rosaceae cluster, P. avium, which belongs to Amygdaleae subfamily, was distinct from the other 7 species of Maleae subfamily, which also supports the classification of Amygdaleae and Maleae subfamily 57,58 . Meanwhile, these species in the same genus were clustered together, such as S. aucuparia and S. torminalis, M. hupehensis and M. domestica, which is consistent with previous reports based on morphological and genetic data [57][58][59] .
In addition, we also found that the clade united P. serratifolia with P. betulifolia (Fig. 10). The present phylogenetic analysis shows that R. bibas is sister to P. serratifolia + P. betulifolia, which is consistent with the previous report 60 . Our results also support the groupings (Sorbus + (Malus + (Rhaphiolepis + (Photinia + Pyrus)))), which have been partly supported in the previous study 61 . However, more accurate sequence and increased taxa sampling are necessary to further research the monophyly of these genus at the mitogenomes level. In general, the phylogenetic tree topology was in line with the evolutionary relationships among those species, indicating the consistency of traditional taxonomy with the molecular classification.

Conclusions
In conclusion, the current study presented the first mitogenome assembly and annotation of P. serratifolia as well as the mitogenome in the genus Photinia. The mitogenome was 473,579 bp in length, containing 38 proteincoding genes, 23 transfer RNA genes, and 6 ribosomal RNA genes. Comparative analysis of gene structure, codon usage, repeat regions, and RNA-editing sites showed that rps2 and rps11 genes were missing, and a clear bias of RNA-editing sites is existing in the P. serratifolia mitogenome. Furthermore, the Ka/Ks analysis based on code substitution revealed that most of the coding genes had undergone negative selections, indicating the conservation of mitochondrial genes during the evolution. Moreover, Phylogenetic analysis based on the mitogenomes of P. serratifolia and 8 other taxa indicates consistency in molecular and taxonomic classification. These results will help in better understanding the features of the P. serratifolia mitogenome and lay the foundation for identifying further evolutionary relationships within Rosaceae.

Materials and methods
Mitochondrial DNA isolation and genome sequencing. Genome assembly. In this study, Raw data of second-generation sequencing were filtered using fastp version 0.20.0 software (https:// github. com/ OpenG ene/ fastp) 63 . The three-generation sequencing data of mitochondrial reads were error-corrected, trimmed, and de-novo-assembled using a Canu assembler (version 1.5) with default parameters 64 . Then, the contig sequence was obtained. The gene databases of plant mitochondria that published on the NCBI were compared using blast v2.6 (https:// blast. ncbi. nlm. nih. gov/ Blast. cgi), and contigs that matched with the mitochondrial genes as the seed sequence were selected. The original data were used to extend and circularize the contigs to obtain the ring-dominant structure (or secondary ring), and then, the assembly was polished using NextPolish 1.3.1 (https:// github. com/ Nexto mics/ NextP olish) 65  www.nature.com/scientificreports/ results were calibrated using second-and third-generation data, and the parameters were set as rerun = 3 and -max_depth = 100. Then, the final assembly results were obtained.
Genome annotation. The assembled P. serratifolia mitogenome was annotated using the GeSeq tool 66 and MITOFY 47 . To confirm the annotated results, the assembled P. serratifolia mitogenome was also BLASTsearched against protein-coding genes and ribosomal RNA (rRNA) genes of available plant mitogenomes at the NCBI. Then, the sequence coordinates of the identified protein-coding genes (PCGs) were manually verified for start and stop codons. The annotations of transfer RNA (tRNA) genes were also confirmed by tRNAscan-SE 2.0 67 . Vi-ennarNA-2.4.14 was used to visualize the secondary structure of tRNA 68 . The possible RNA-editing sites in the PCGs of P. serratifolia were predicted using the online predictive RNA editor for plant mitochondrial genes (PREP-Mt) suite of servers (http:// prep. unl. edu/) 69 . The codon frequencies were calculated using the Codon Usage tool in the Sequence Manipulation Suite (bioinformatics.org/sms2/codon_usage.html) 70 . The relative synonymous codon usage (RSCU) was calculated using the CAI Python package of Lee 71 . The physical circular map was drawn using the Organellar Genome DRAW (OGDraw) v1.2 program 72 . The final annotated mitogenome sequences of P. serratifolia have been deposited in the NCBI GenBank (accession no. MZ153172).
Analysis of repeated sequence. The simple sequence and tandem repeats were detected in the P. serratifolia mitogenome. The MIcroSAtellite (MISA) identification tool Perl script was used to detect simple sequence repeats 73 . The repeats of mono-, di-, tri-, tetra-, penta-, and hexanucleotide bases with 10, 6, 5, 5, 5, and 5 repeat numbers, respectively, were identified. ROUSfinder was used for the identification of repeat elements 74 . Subsequently, in order to explore whether these identified repeats lead to the formation of multiple mitogenome isoforms, all these sequencing reads were aligned to the P. serratifolia mitogenome using Geneious Basic 75 . If there are a branching node in repeats, the coverage of reads at the branch will be halved 76 . In this way, the evidence of recombination that was mediated by repeat sequences could be observed directly.

Data availability
The complete mitochondrial genome of P. serratifolia has been submitted to the NCBI database (https:// www. ncbi. nlm. nih. gov/) under the accession number MZ153172.